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Abstract 

It has been suggested that the minimization of the probability for lethal 
mutations is a major constraint shaping the genetic code jij. Indeed, the genetic 
code has been found highly protective against transitions Qj. Here, we show 
that data on polymerase-induced frameshifts provide a rationale for the codon 
assignment of chain termination signals (CTS). 

We work on the assumption that the mutational spectra of in vitro polymerization 



0-[n| for DNA-polymerases belonging to families found in at least two of the three 
living kingdoms [0 are relevant with respect to primordial polymerases. We will here 
not take into account DNA polymerases beta, which are family X DNA polymerases 
exclusively found among eukaryotes so far [H| , and HIV-reverse transcriptases, which 



are thought to be active as dimers |14j] and which emerged very 'late' in evolution 



As it is believed that RNA preceded DNA in evolution, data for RNA-replicases 
would be more adapted but are not available. Recent evidence shows however that 
DNA- and RNA-replicases are very closely related [|IJ]: a single tyrosine to pheny- 



lalanine substitution changes DNA-replicases into RNA-replicases [|I7[]; E.coli DNA- 
polymerase I is also an accurate RNA-dependent DNA-polymerase fll8| . A single 
mutation confers to the MMLV-reverse transcriptase the ability to replicate RNA 

IJ- 

Polymerase-induced mutations are mainly substitutions and frameshifts 0-0. 
The frameshifts' error-rate is about half the substitutions' error-rate for the Klenow- 
polymerase domain ||, which has no nuclease domain as can be assumed for a pri- 
mordial polymerase. Frameshifts result mostly from the addition or deletion of one 
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base Frameshifts are highly deleterious as they prevent translation in the cor- 

rect reading frame of the codons downstream the mutation. They happen in directly 
repeated and palindromic sequences JTOJ (where the assumption of polymerase-error 
tolerance can be shown to be consistent) and in non-reiterated runs, where single-base 
deletions occur more frequently than single-base additions [0, 3,0,0- Additions will 
therefore be neglected here. For polymerases with or without nuclease domains we 
noticed no significant differences in the consensus sequence for single-base deletion 
sites in non-reiterated runs. It has been defined as YR ||, TTR fTlfl , YTG ||, TR 
ITUf , and refined here from the current data Pl-flHl as YTRV (V= C, A or G 0). 

If the genetic code has been optimized for frameshift tolerance, then it should allow 
to code for most amino acid sequences without using YTRV sequences, or the TRV, 
YTR and NYT potential deletion site codons (PDSC) nor using their reverse-comple- 
mentary (rc) sequences which are also expected to yield deletions during replication 
(Fig. 1): 

If the base T is the first base of a codon and in case the previous codon has a 
pyrimidine as third codon base, then the amino acid should be encoded without using 
the six codons TRV; if the base T is the second base of a codon and in case the first 
base of the following codon is C, A or G, then the amino acid should be encoded 
without using the codons YTR; if the base T is the third base of a codon and in 
case the following amino acid has a RVN-type codon, then the amino acid should be 
encoded without using the eight codons NYT. 

The most deleterious codons are TAA and TAG and their rc-sequences TTA and 
CTA that are both PDSC and rc-PDSC. Deletions at codons encoding amino acids are 
likely to yield non-functional proteins, as all downstream codons are not translated. 
However, deletions at codons encoding CTS should result in addition of peptides to 
the proteins' carboxy-termini, thereby likely providing functional proteins. The least 
deleterious effects are therefore obtained by assigning the most deleterious codons to 
CTS and not to amino acids. Frameshift tolerance may therefore have been the major 
constraint in the codon assignment of CTS. The codons TTA and CTA encode leucine 
having the highest, six- fold degeneracy. Frameshift tolerance may therefore be one of 
the constraints imposing a high degeneracy to these amino acids. 

We point out, however, that substitution tolerance and frameshift mutation tol- 
erance are to be considered as competing constraints on the selection of an optimal 
genetic code: substitution tolerance favors a code in which an amino acid is encoded 
by triplets differing only by single-base mutations. On the other hand, given that 
single-base deletions in non-reiterated runs occur mostly on a specific template se- 
quence, (YTRV), tolerance of these frameshift mutations favors a code in which amino 
acids are encoded by triplets differing strongly from another, so that amino acid se- 
quences are more likely to be able to be coded for without using the YTRV sequences. 
Our argument is based on the consideration that, for the consensus sequences, the 
ratio between single-base deletions and base substitutions is much greater than in 
other sequences, so that, for the assignment of specific codons, frameshift mutation 
tolerance should be a stronger constraint than substitution tolerance. 
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In conclusion, these results provide an insight into the constraints yielding the 
genetic code's fixation and suggest that the codon assignment of CTS may be con- 
temporary with the emergence of polymerases being enzymes rather than ribozymes. 
Polymerase-error tolerance arguments similar to the one presented here may be use- 
ful in the investigation of alternative terrestrial or exobiological genetic codes and 
possibly also for the engineering of new genetic codes. 

Tables available from autors via email and mail. 
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